ui hierarchy
A Reference Data Model for Process-Related User Interaction Logs
Abb, Luka, Rehse, Jana-Rebecca
User interaction (UI) logs are high-resolution event logs that record low-level activities performed by a user during the execution of a task in an information system. Each event in a UI log corresponds to a single interaction between the user and the interface, such as clicking a button or entering a string into a text field. UI logs are used for purposes like task mining or robotic process automation (RPA), but each study and tool relies on a different conceptualization and implementation of the elements and attributes that constitute user interactions. This lack of standardization makes it difficult to integrate UI logs from different sources and to combine tools for UI data collection with downstream analytics or automation solutions. To address this, we propose a universally applicable reference data model for process-related UI logs. Based on a review of scientific literature and industry solutions, this model includes the core attributes of UI logs, but remains flexible with regard to the scope, level of abstraction, and case notion. We provide an implementation of the model as an extension to the XES interchange standard for event logs and demonstrate its practical applicability in a real-life RPA scenario.
Understanding user interfaces with screen parsing
This blog post summarizes our paper Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots, which was published in the proceedings of UIST 2021. Machines that understand and operate user interfaces (UIs) on behalf of users could offer many benefits. For example, a screen reader (e.g., VoiceOver and TalkBack) could facilitate access to UIs for blind and visually impaired users, and task automation agents (e.g., Siri Shortcuts and IFTTT) could allow users to automate repetitive or complex tasks with their devices more efficiently. These benefits are gated on how well these systems can understand an underlying app's UI by reasoning about 1) the functionality present, 2) how its different components work together, and 3) how it can be operated to accomplish some goal. Many rely on the availability of UI metadata (e.g., the view hierarchy and the accessibility hierarchy), which provide some information about what elements are present and their properties.